A Comparison of Compiler Tiling Algorithms

نویسندگان

Gabriel Rivera

Chau-Wen Tseng

چکیده

Linear algebra codes contain data locality which can be exploited by tiling multiple loop nests. Several approaches to tiling have been suggested for avoiding connict misses in low associativity caches. We propose a new technique based on intra-variable padding and compare its performance with existing techniques. Results show padding improves performance of matrix multiply by over 100% in some cases over a range of matrix sizes. Comparing the eecacy of diierent tiling algorithms, we discover rectangular tiles are slightly more eecient than square tiles. Overall, tiling improves performance from 0-250%. Copying tiles at run time proves to be quite eeective.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Communication-Minimal Tiling of Uniform Dependence Loops

Tiling is a loop transformation that a compiler uses to create automatically blocked algorithms in order to improve the benefits of the memory hierarchy and reduce the communication overhead between processors. Motivated by existing results, this paper presents a conceptually simple approach to finding tilings with a minimal amount of communication between tiles. The development of almost all r...

متن کامل

Imperfectly - Nested Loops Yonghong

This paper presents an integrated compiler framework for tiling a class of nontrivial imperfectly-nested loops such that cache locality is improved. We develop a new memory cost model to analyze data reuse in terms of both the cache and the TLB, based on which we compute the tile size with or without array duplication. We determine whether to duplicate arrays for tiling by comparing the respect...

متن کامل

Code Tiling for Improving the Cache Performance of PDE Solvers

For SOR-like PDE solvers, loop tiling either helps little in improving data locality or hurts their performance. This paper presents a novel compiler technique called code tiling for generating fast tiled codes for these solvers on uniprocessors with a memory hierarchy. Code tiling combines loop tiling with a new array layout transformation called data tiling in such a way that a significant am...

متن کامل

Using a Dynamic Schedule to Increase the Performance of Tiling in Stencil Computations

A stencil computation determines the values of points in a grid of some dimensionality by repeatedly evaluating a given function of a grid point and its neighbors. The parallelization and optimization of stencil computations are subject of ongoing research. The most prevalent approach is the subdivision of the iteration domain into smaller pieces, called tiles. We give an overview of a method t...

متن کامل

Reducing Data Communication Overhead for Doacross Loop Nests Reducing Data Communication Overhead for Doacross Loop Nests

If the loop iterations of a loop nest cannot be partitioned into independent sets, the data communication for data dependences are inevitable in order to execute them on parallel machines. This kind of loop nests are referred to as Doacross loop nests. This paper is concerned with compiler algorithms for parallelizing Doacross loop nests for distributed-memory multicomputers. We present a metho...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 1999

A Comparison of Compiler Tiling Algorithms

نویسندگان

چکیده

منابع مشابه

Communication-Minimal Tiling of Uniform Dependence Loops

Imperfectly - Nested Loops Yonghong

Code Tiling for Improving the Cache Performance of PDE Solvers

Using a Dynamic Schedule to Increase the Performance of Tiling in Stencil Computations

Reducing Data Communication Overhead for Doacross Loop Nests Reducing Data Communication Overhead for Doacross Loop Nests

عنوان ژورنال:

اشتراک گذاری